Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing
نویسندگان
چکیده
Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.
منابع مشابه
Thermal anomalies detection before earthquake using three filters (Fourier, Wavelet and Logarithmic Differential Filter), A Case Study of two Earthquakes in Iran
Earthquake is one of the most destructive natural phenomena which has human and financial losses. The existence of an efficient prediction system and early warning system will be useful for reducing effects of destroying earthquake. In this research, the soil temperature time-series data, obtained from three meteorological station, using three filters (Fourier, Wavelet and Logarithmic Different...
متن کاملTrends in Cleaning Relational Data: Consistency and Deduplication
Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Poor data across businesses and the government cost the U.S. economy $3.1 trillion a year, according to a report by InsightSquared in 2012. To detect data errors, data quality rules or integrity constraints (ICs) have been propose...
متن کاملBehavior-Based Online Anomaly Detection for a Nationwide Short Message Service
As fraudsters understand the time window and act fast, real-time fraud management systems becomes necessary in Telecommunication Industry. In this work, by analyzing traces collected from a nationwide cellular network over a period of a month, an online behavior-based anomaly detection system is provided. Over time, users' interactions with the network provides a vast amount of usage data. Thes...
متن کاملDynamic anomaly detection by using incremental approximate PCA in AODV-based MANETs
Mobile Ad-hoc Networks (MANETs) by contrast of other networks have more vulnerability because of having nature properties such as dynamic topology and no infrastructure. Therefore, a considerable challenge for these networks, is a method expansion that to be able to specify anomalies with high accuracy at network dynamic topology alternation. In this paper, two methods proposed for dynamic anom...
متن کاملThe detection of 11th of March 2011 Tohoku's TEC seismo-ionospheric anomalies using the Singular Value Thresholding (SVT) method
The Total Electron Content (TEC) measured by the Global Positioning System (GPS) is useful for registering the pre-earthquake ionospheric anomalies appearing before a large earthquake. In this paper the TEC value was predicted using the singular value thresholding (SVT) method. Also, the anomaly is detected utilizing this predicted value and the definition of the threshold value, leading to the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2017